Using sensitivity analysis and visualization techniques to open black box data mining models

نویسندگان

  • Paulo Cortez
  • Mark J. Embrechts
چکیده

0020-0255/$ see front matter 2012 Elsevier Inc http://dx.doi.org/10.1016/j.ins.2012.10.039 ⇑ Corresponding author. E-mail addresses: [email protected] (P. Cor In this paper, we propose a new visualization approach based on a Sensitivity Analysis (SA) to extract human understandable knowledge from supervised learning black box data mining models, such as Neural Networks (NNs), Support Vector Machines (SVMs) and ensembles, including Random Forests (RFs). Five SA methods (three of which are purely new) and four measures of input importance (one novel) are presented. Also, the SA approach is adapted to handle discrete variables and to aggregate multiple sensitivity responses. Moreover, several visualizations for the SA results are introduced, such as input pair importance color matrix and variable effect characteristic surface. A wide range of experiments was performed in order to test the SA methods and measures by fitting four well-knownmodels (NN, SVM, RF and decision trees) to synthetic datasets (five regression and five classification tasks). In addition, the visualization capabilities of the SA are demonstrated using four real-world datasets (e.g., bank direct marketing and white wine quality). 2012 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of the Patient Requirements Using Lean Six Sigma and Data Mining

Lean health care is one of new managing approaches putting the patient at the core of each change. Lean construction is based on visualization for understanding and prioritizing imporvments. By using only visualization techniques, so much important information could be missed. In order to prioritize and select improvements, it’s essential to integrate new analysis tools to achieve a good unders...

متن کامل

Prediction of Blasting Cost in Limestone Mines Using Gene Expression Programming Model and Artificial Neural Networks

The use of blasting cost (BC) prediction to achieve optimal fragmentation is necessary in order to control the adverse consequences of blasting such as fly rock, ground vibration, and air blast in open-pit mines. In this research work, BC is predicted through collecting 146 blasting data from six limestone mines in Iran using the artificial neural networks (ANNs), gene expression programming (G...

متن کامل

Prediction of mortality in patients admitted to intensive care units, A comparison of three data mining techniques: a brief report.

Background: Early outcome prediction of hospitalized patients is critical because the intensivists are constantly striving to improve patients' survival by taking effective medical decisions about ill patients in Intensive Care Units (ICUs). Despite rapid progress in medical treatments and intensive care technology, the analysis of outcomes, including mortality prediction, has been a challenge ...

متن کامل

Visualization and Concept Drift Detection Using Explanations of Incremental Models

The temporal dimension that is ever more prevalent in data makes data stream mining (incremental learning) an important field of machine learning. In addition to accurate predictions, explanations of the models and examples are a crucial component as they provide insight into model’s decision and lessen its black box nature, thus increasing the user’s trust. Proper visual representation of data...

متن کامل

Back analysis of inter-ramp slope failure in Teghout copper mine

Slope stability is one of the most important issues in open pit mining design. The main purpose of any open pit mine design is to propose an optimal excavation configuration, considering safety, ore recovery and financial return. An accurate pit slope design which accounts for the mine geology, structural geology, rock mass properties, and hydrogeology models of the mine area, drastically enhan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 225  شماره 

صفحات  -

تاریخ انتشار 2013